Device and method for generating a classifier for automatically sorting objects

ABSTRACT

The invention is in the field of automatic systems for electronic classification of objects which Are characterized by electronic attributes. A device and a method for generating a classifier for Automatically sorting objects, which are respectively characterized by electronic attributes, are Provided, in particular a classifier for automatically sorting manufactured products into up-to-storage standard products and defective products, having a storage device for storing a set of electronic training data, which comprises a respective electronic attribute set for training objects, and having a processor device for processing the electronic training data, a dimension (d) being determined by the number of attributes in the respective electronic attribute set. The processor device has discretization means for automatically discretizing a function space (V), which is defined over the real numbers (R  d ), into subspaces (V N , N=2, 3, . . .) by means of a sparse grid technique and processing the electronic training data with the aid of a processor device.

[0001] The invention is in the field of automatic systems for electronicclassification of objects which are characterized by electronicattributes.

[0002] Such systems are used, for example, in conjunction with themanufacture of products in large piece numbers. In the course ofproduction of an industrial mass-produced product, sensor means are usedfor automatically acquiring various electronic data on the properties ofthe manufactured products in order, for example, to check the observanceof specific quality criteria. This can involve, for example, thedimensions, the weight, the temperature or the material composition ofthe product. The acquired electronic data are to be used to detectdefective products automatically, select them and subsequently appraisethem manually. The first step in this process is for historical data onmanufactured products, for example on the products produced in pastmanufacturing processes, to be stored electronically in a database. Adatabase accessing means of a computer installation is used to feed thehistorical data in the course of a classification method to a processordevice which uses the historical data to generate automaticallycharacteristic profiles of the two quality classes “Product acceptable”and “Product defective” and to store them in a classifier file. What istermed a classifier is formed automatically in this way with the aid ofmachine learning.

[0003] During the production process for manufacturing the products tobe tested and/or classified, the electronic data supplied for eachmanufactured product by the sensors are evaluated in the onlineclassification mode by an online classification device on the basis ofthe classifier file or the classifier, and the tested product isautomatically assigned to one of the two quality classes. If the class“Product defective” is involved, the appropriate product is selected andsent for manual appraisal.

[0004] A substantial problem in the case of the classifiers described bythe example is currently to be found in the large number of the acquiredhistorical data. In the course of the comprehensive networking ofcomputer-controlled production installations or other computerinstallations via the Internet and Intranets, as well as the corporatecentralization of electronic data, an explosive growth is currentlytaking place in the electronic data stocks of companies. Many databasesalready contain millions and billions of customer and/or product data.The processing of large data stocks is therefore playing an ever greaterrole in all fields of data processing, not only in conjunction with theproduction process outlined above. On the one hand, the information,which can be derived automatically from historical data which arepresent in very large numbers, is “more valuable” with regard to theformation of the classifier, since a large number of historical data areused to generate it automatically, while on the other hand there existsthe problem of managing the number of historical data efficiently withregard to the time expended when constructing the classifier.

[0005] Known classification methods such as described, for example, inthe printed publication U.S. Pat. No. 5,640,492 are based for the mostpart on decision trees or neural networks. Decision trees admittedlypermit automatic classification over large electronic data volumes, butgenerally exhibit a low quality of classification, since they treat theattributes of the data separately and not in a multivariat fashion.

[0006] The best conventional classification methods such asbackpropagation networks, radial basis functions or support vectormachines can mostly be formulated as regularization networks.Regularization networks minimize an error functional which comprises aweighted sum of an approximation error term and of a smoothing operator.The known machine learning methods execute this minimization over thespace of the data points, whose size is a function of the number of theacquired historical data, and are therefore suitable only for historicaldata records which are small- to medium-sized.

[0007] It is usually necessary in this case to solve the followingproblem of classification and/or regression. M data points exist in ad-dimensional space x_(i), i=1, . . . , M, x_(i)∈R^(d). The data pointsare assigned function values: y_(I), i=1, . . . , M, y_(i)∈R^(d)(regression) or y_(i)∈{−1;+1} (classification). The training set istherefore yielded as 0(h_(n) ^(−d))=0(2^(nd) ) . The followingregularization problem now needs to be solved:

min R(ƒ)   (1)

[0008] ƒ∈V

[0009] with

Ω₁,   (2)

[0010] where

[0011] C(x,y) is an error functional, for example C(x,y)=(x−y)²;

[0012] φ(ƒ) is a smoothing operator, φ(f)=∥pf∥² ₂, for example Pf=∇f;

[0013] f is a regression/classification function with the requiredsmoothness properties for the operator P; and

[0014] λ is a regularization parameter.

[0015] The classification functionƒ usually determined in this case as aweighted sum of ansatz functions φ_(i) over the data points:$\begin{matrix}{{f_{C}(x)} = {\sum\limits_{i = 1}^{M}{\alpha_{i}{{\phi_{i}(x)}.}}}} & (3)\end{matrix}$

[0016] The known approach to a solution leads essentially to twoproblems: (i) because of the global nature of the ansatz functions φ_(i)and the number of coefficients α_(i) (equal to the number M of datapoints), the solution to the regression problem is very time-consumingand sometimes impossible for larger data volumes, since it requires theuse of matrices of size M×M; (ii) the application of the classificationfunction to new data records in the course of online classification isvery time-consuming, since summing has to be carried out over allfunctions

φ_(I)(i=1, . . ., M).

[0017] It is the object of the invention to create a possibility to useautomatic systems for the electronic classification of objects, whichare characterized by electronic attributes, even for applications inwhich a very large number of data points are present.

[0018] The object is achieved according to the invention by means of theindependent claims.

[0019] An essential idea which is covered by the invention consists inthe application of the sparse grid technique. For this purpose, thefunction ƒ not generated in accordance with the formulation of (3) but adiscretization of the space V is undertaken, V_(N)∈V being a finitelydimensioned subspace of V, and N being a dimension of the subspaceV_(N). The function ƒ is determined as $\begin{matrix}{{f_{N}(x)} = {\sum\limits_{i = 1}^{N}{\alpha_{i}{{\phi_{i}(x)}.}}}} & (4)\end{matrix}$

[0020] The regularization problem in the space V_(N) determining ƒ_(N)is then: $\begin{matrix}{{{R\left( f_{N} \right)} = {{\frac{1}{M}{\sum\limits_{i = 1}^{M}\left( {{f_{N}\left( x_{i} \right)} - y_{i}} \right)^{2}}} + {\lambda {{Pf}_{N}}_{L_{2}}^{2}}}},{{{with}\quad {C\left( {x,y} \right)}} = {{\left( {x - y} \right)^{2}\quad {and}\quad {\varphi (f)}} = {{{Pf}}_{2}^{2}.}}}} & (5)\end{matrix}$

[0021] By contrast with conventional methods, the sparse grid space isselected as subspace V_(N). This avoids the problems of the prior art.The number N of the coefficients α_(i) to be determined depends only onthe discretization of the space V. The effort on the solution of (5)scales linearly with the number M of data points. Consequently, themethod can be applied for data volumes of virtually any desired size.The classification function ƒ_(N) is built up only from N ansatzfunctions and can therefore be evaluated quickly in the application.

[0022] The essential advantage which the invention provides bycomparison with the prior art consists in that the outlay for generatingthe classifier scales only linearly with the number of data points, andthus the classifier can be generated for electronic data volumes ofvirtually any desired size. A further advantage consists in the higherspeed of application of the classifier to new data records, that is tosay in the quick online classification.

[0023] The sparse grid classification method can also be used toevaluate customer, financial and corporate data.

[0024] Advantageous developments of the invention are disclosed in thedependent subclaims.

[0025] The invention is explained in more detail below with the aid ofexemplary embodiments and with reference to a drawing, in which:

[0026]FIG. 1 shows a schematic block diagram of a device forautomatically generating a classifier and/or for online classification;

[0027]FIG. 2 shows a schematic block diagram for explaining a method forautomatically generating a classifier by means of sparse gridtechnology;

[0028]FIG. 3 shows a schematic block diagram for explaining a method forautomatically applying an online classification;

[0029]FIGS. 4A and 4B show an illustration of a two-dimensional and,respectively, a three-dimensional sparse grid (level n=5);

[0030]FIG. 5 shows the combination technique for level 4 in 2dimensions; and

[0031]FIGS. 6A and 6B show a spiral data record with sparse grids forlevel 6 and level 8, respectively.

[0032] The sparse grid classification method is described in detailbelow.

[0033] Consideration is given firstly in this case to an arbitrarydiscretization V_(N) of the function space V, which leads to theregularization problem (5). Substituting the ansatz function (4) in theregularization formulation (5) yields $\begin{matrix}{{R\left( f_{N} \right)} = {{\frac{1}{M}{\sum\limits_{i = 1}^{M}\left( {{\sum\limits_{j = 1}^{N}{\alpha_{j}{\phi_{j}\left( x_{i} \right)}}} - y_{i}} \right)^{2}}} + {\lambda {\sum\limits_{i = 1}^{N}{\sum\limits_{j = 1}^{N}{\alpha_{i}{{\alpha_{j}\left( {{P\quad \phi_{i}},{P\quad \phi_{j}}} \right)}_{L2}.}}}}}}} & (6)\end{matrix}$

[0034] Differentiation with respect to α_(k), k=1, . . . , N yields$\begin{matrix}{0 = {\frac{\partial{R\left( f_{N} \right)}}{\partial\alpha_{k}} = {{\frac{2}{M}{\sum\limits_{i = 1}^{M}{\left( {{\sum\limits_{j = 1}^{N}{\alpha_{j}{\phi_{j}\left( x_{i} \right)}}} - y_{i}} \right) \cdot {\phi_{k}\left( x_{i} \right)}}}} + {2\lambda {\sum\limits_{j = 1}^{N}{{\alpha_{j}\left( {{P\quad \phi_{j}},{P\quad \phi_{k}}} \right)}_{L2}.}}}}}} & (7)\end{matrix}$

[0035] This is equivalent to ( k=1, . . . , N) $\begin{matrix}{{\sum\limits_{j = 1}^{N}{\alpha_{j}\left\lbrack {{M\quad {\lambda \left( {{P\quad \phi_{j}},{P\quad \phi_{k}}} \right)}_{L2}} + {\sum\limits_{i = 1}^{M}{{\phi_{j}\left( x_{i} \right)} \cdot {\phi_{k}\left( x_{i} \right)}}}} \right\rbrack}} = {\sum\limits_{i = 1}^{M}{y_{i}{{\phi_{k}\left( x_{i} \right)}.}}}} & (8)\end{matrix}$

[0036] This corresponds in matrix notation to the linear system

(λC+B·B ^(T))α=By.   (9)

[0037] Here, C is a square N×N matrix with entries C_(J,k)=M·(Pφ_(j),Pφ_(k))_(L2), j, k=1, . . . N, and B is a rectangular N×M matrix withentries B_(i,j)=φ_(j)(x_(i)), i=1, . . . M, j=1,. . . , N. The vector ycontains the data y_(i) and has the length M. The unknown vector acontains the degrees of freedom α_(j) and has the length N.

[0038] Various minimization problems in d-dimensional space occurdepending on the regularization operator. If, for example, the gradientP=∇is used in the regularization expression in (2), the result is aPoisson problem with an additional term which corresponds to theinterpolation problem. The natural boundary conditions for such adifferential equation in, for example, Ω=[0,1]^(d) are Neumannconditions. The discretization (4) now yields the system (9) of linearequations, C corresponding to a discrete Laplace matrix. The system mustnow be solved in order to obtain the classifier ƒ_(N).

[0039] The representation so far has not been specific as to whichfinite dimensional subspace V_(N) and which type of basis functions areto be used. By contrast with conventional data mining approaches, whichoperate with ansatz functions which are assigned to data points, use isnow made of a specific grid in feature space in order to determine theclassifier with the aid of these grid points. This is similar to thenumerical treatment of partial differential equations. For reasons ofsimplicity, the further description will be restricted to the case ofx_(i)∈Ω=[0,1]^(d) . This situation can always be achieved by a suitablerescaling of the data space. A conventional finite elementdiscretization would now employ an equidistant grid Ω_(n) with a gridwidth h_(n)=2^(−n) in each coordinate direction, n being the refinementlevel. In the following the gradient P=Ωis used in the regularizationexpression in (2). Let j be the multi index (j₁, . . . , j_(d))∈N^(d). Afinite element method with piecewise d-linear ansatz and test functionsφ_(n,j)(x) on the grid Ω_(n) would now yield${\left( {{f_{N}(x)} =} \right){f_{n}(x)}} = {\sum\limits_{j_{1} = 0}^{2^{n}}{\underset{j_{d} = 0}{\overset{2^{n}}{\cdots\sum}}\alpha_{n,j}{\varphi_{n,j}(x)}}}$

[0040] and the variational formulation (6)-(9) would lead to thediscrete system of equations

(λC _(n) +B _(n) ·B ^(t) _(n))α_(n) =B _(n) ^(y)  (10)

[0041] of size (2^(n)+1)^(d) and with matrix entries in accordance with(9). It may be pointed out that ƒ_(n) lives in the space

V _(n):=span{φ_(n,j) , j _(t),=0, . . . , 2^(n,t)=1, . . .,d}.

[0042] The discrete problem (10) could be treated in principle by meansof a suitable solver such as the conjugate gradient method, a multigridmethod or another efficient iteration method. However, this directapplication of a finite element discretization and of a suitable linearsolver to the existing system of equations is not possible ford-dimensional problems if d is greater than 4.

[0043] The number of grid points would be of the order of O(h,^(−d)_(n))=O(2^(nd)) and, in the best case, when an effective technique suchas the multigrid method is used, the number of operations is of the sameorder of magnitude. The “curse” of dimensionality is to be seen here:the complexity of the problem grows exponentially with d. At least ford>4 and a sensible value of n, the system of linear equations that isproduced can no longer be stored and solved on the largest currentparallel computers.

[0044] In order to reduce the “curse” of dimension, the approach istherefore to use a sparse grid formulation: Let l=(l₁, . . . ,l_(d))∈N^(d) be a multiindex. The problem is discretized and solved on acertain sequence of grids Ω_(l) with a uniform grid width h_(t)=2⁻⁴ inthe t-th coordinate direction. These grids can have different gridwidths for different coordinate directions. Consideration will be givenin this regard to Ω_(l) with

l _(l) +. . . +l _(d) =n+(d−1)−q, q=0, . . . ,d−1, l _(t)>0.

[0045] Let us define L as$L:={\sum\limits_{q = 0}^{d - 1}{\sum\limits_{{l_{1} + \ldots + l_{d}} = {n + {({d - 1})} - q}}1.}}$

[0046] The finite element approach with piecewise d-linear testfunctions $\begin{matrix}{{{\varphi_{1,j}(x)}:={\prod\limits_{t = 1}^{d}{\varphi_{l_{1},j_{t}}\left( x_{t} \right)}}}{yields}{{f_{1}(x)} = {\sum\limits_{j_{1} = 0}^{2^{l_{1}}}{\cdots {\sum\limits_{j_{d} = 0}^{2^{l_{d}}}{\alpha_{1,j}{\phi_{1,j}(x)}}}}}}} & (12)\end{matrix}$

[0047] on the grid Ω₁, and the variation formulation (6)-(9) results inthe discrete system of equations

(λC _(l) +B _(l) ·B _(l) ^(T)−α_(l) =B _(l) y   (13)

[0048] with the matrices

(C _(l))_(j,k) =M·(∇φ_(l,j),∇φ_(l,k)) and (B _(l))_(j,i)=Ω_(l,j)(x_(i)),

[0049] j_(t),k_(t)=0, . . . , 2^(l) ^(_(t)) =0, . . . ,d ,i=1, . . . , Mand the unknown vector (a_(l))_(j),j_(t)=0, . . . , 2^(l) ^(_(t)) t=1, .. . ,d. These problems are then solved using a suitable method. Theconjugate gradient method is used for this purpose together with adiagonal preconditioner. However, it is also possible to apply asuitable multigrid method with partial semi-coarsening. The discretesolutions ƒ_(l) are contained in the space

V _(l) :=span{Φ _(l,j) j _(t), . . . , 2^(l) ^(_(t)) ,t=1,. . . ,d  (14)

[0050] of the piecewise d-linear functions on the grid Ω_(l).

[0051] It may be pointed out that, by comparison with (10), all theseproblems are now substantially reduced in size. Instead of a problem ofsize dim(V_(n))=O(h^(−d) _(n))=O(2_(nd)) we need to treat O(dn^(d−1))problems of size dim(V₁)=O(h⁻¹ _(n))=O(2^(n)) dim(V_(l))=0(h⁻¹_(n))=0(2^(n)). Furthermore, these problems can be solved independentlyof one another, and this permits a simple parallelization (compare M.Griebel, THE COMBINATION TECHNIQUE FOR THE SPARSE GRID SOLUTION OF PDESON MULTIPROCESSOR MACHINES, Parallel Processing Letters, 2, 1992, pages61-70).

[0052] Finally, the results ƒ_(l)(x)=Σ_(j)α_(l,j)φ_(l,j)(x)∈V_(l) of thedifferent grids Ω_(l) . can be combined as follows: $\begin{matrix}{{f_{n}^{(c)}(x)}:={\sum\limits_{q = 0}^{d - 1}{\left( {- 1} \right)^{q}\begin{pmatrix}{\quad {d - 1}} \\{\quad q}\end{pmatrix}{\sum\limits_{{l_{1} + \ldots + l_{d}} = {n + {({d - 1})} - q}}{{f_{1}(x)}.}}}}} & (15)\end{matrix}$

[0053] The resulting function ƒ_(n)(^(c)) lives in the sparse-grid space$V_{n}^{(s)}:={\bigcup\limits_{\substack{{l_{1} + \ldots + I_{d}} = {n + {({d - 1})} - q} \\ {q = 0},\ldots,{d - 1},{l_{i} > 0}}}{V_{1}.}}$

[0054] The sparse-grid space has a dimension dim(V_(n)(^(s))=O(_(n)⁻¹(log(h _(n) ⁻¹))^(d−1)) It is defined by a piecewise d-linearhierarchical tensor product basis (compare H.-J. BUNGARTZ, DUNNE GITTERUND DEREN ANWENDUNG BEI DER ADAPTIVEN LOSUNG DER DREIDIMENSIONALENPOISSON-GLEICHUNG [Sparse grids and their application in the adaptivesolution of the three-dimensional Poisson equation], Dissertation,Institut für Informatik, Technical University Munich, 1992). A sparsegrid is illustrated in FIGS. 4A and 4B (level 5), respectively, for thetwo-dimensional and three-dimensional cases. FIG. 5 shows the gridswhich are required in the combination formula of level 4 in thetwo-dimensional case. It is also shown in FIG. 5 how the superimpositionof the points in the sequence of the grids of the combination techniquesupplies a sparse grid of the corresponding level n.

[0055] It may be pointed out that the sum over the discrete functionsfrom different spaces V _(l) in (15) requires the d-linear interpolationwhich precisely corresponds to the transformation to the representationon the hierarchical basis. Details are described in the followingdocument: M. Griebel, M. Schneider, C. Zenger, A COMBINATION TECHNIQUEFOR THE SOLUTION OF SPARSE GRID PROBLEMS, Iterative Methods in LinearAlgebra, P. de Groen and R. Beauwens, eds., IMACS, Elsevier, NorthHolland, 1992, pages 263 - 281. In the case illustrated, however, thefunction ƒ_(n)(^(c)) is never set up explicitly. Instead of this, thesolutions ƒ₁ are held on the different grids Ω₁ which occur in thecombination formula. Each linear operator F over ƒ_(n)(^(C)) can noweasily be expressed with the aid of the combination formula (15), theoperation of F is being performed directly on the functions ƒ_(n), thatis to say $\begin{matrix}{{F\left( f_{n}^{(c)} \right)} = {\sum\limits_{q = 0}^{d - 1}{\left( {- 1} \right)^{q}\begin{pmatrix}{d - 1} \\q\end{pmatrix}{\sum\limits_{{l_{1} + \ldots + I_{d}} = {n + {({d - 1})} - q}}{{F\left( f_{1} \right)}.}}}}} & (16)\end{matrix}$

[0056] If it is now required to evaluate a newly specified set of datapoints {{tilde over (x)}_(i=1) ^({tilde over (M)}) (the test orevaluation data) with

{tilde over (y)} _(i):=ƒ_(n)(^(c))({tilde over (x)} _(i)),i=1, . . .,{tilde over (M)}

[0057] all that is required is to form the combination of the associatedvalues for ƒ_(l) in accordance with (15). The evaluation of the variousƒ_(l) at the test points can be performed in the completely parallelfashion, and that summation essentially requires an all-reduceoperation. It has been proved for elliptical partial differentialequations of second order that the combination solution ƒ_(n)(^(c)) isnearly as accurate as the fall grid solution ƒ_(n), that is to say thediscretization error satisfies

∥e _(n) ^((c))∥_(L) _(P) :=∥f−f_(n) ^((c))∥_(L) _(P) =0(h _(n) ²log(h_(n) ⁻¹)^(d−1))

[0058] assuming a slightly stronger smoothness requirement on ƒ bycomparison with the full grid approach. The seminorm $\begin{matrix}{{f}_{\infty}:={\frac{\partial^{2d}f}{\prod\limits_{j = 1}^{d}{\partial x_{j}^{2}}}}_{\infty}} & (17)\end{matrix}$

[0059] is required to be bounded. A series expansion of the error isalso required. Its existence is known for PDE model problems (compareH.-J. Bungartz, M. Griebel, D. Roschke, C. Zenger,

[0060] POINTWISE CONVERGENCE OF THE COMBINATION TECHNIQUE FOR THELAPLACE EQUATION, East-West J. Numer. Math., 2, 1994, pages 21-45).

[0061] The combination technique is only one of various methods forsolving problems on sparse grids. It may be pointed out that Galerkin,finite element, finite difference, finite volume and collocationapproaches also exist, these operate directly with the hierarchicalproduct basis on the sparse grid. However, the combination technique isconceptually simpler and easier to implement. Furthermore, it permitsthe reuse of standard solvers for its various subproblems, and can beparallelized in a simple way.

[0062] So far, only d-linear basis functions based on a tensor productapproach have been mentioned (compare J. Garcke, M. Griebel, M. Thess,DATA MINING WITH SPARSE GRIDS, SFB 256 Preprint 675, Institute forApplied Mathematcis, Bonn University, 2000). However, linear basisfunctions based on simplicial decompositions are also possible for thegrids of the combination technique: Use is made for this purpose of whatis termed Kuhn's triangulation (compare H. W. Kuhn, SOME COMBINATORIALLEMMAS IN TOPOLOGY, IBM j. Res. Develop., 1960, pages 518-524). Thiscase has been described in J. Garcke and M. Griebel, DATA MINING WITHSPARSE GRIDS USING SIMPLICIAL BASIS FUNCTIONS, KDD 2001 (accepted),2001.

[0063] It is also possible to use other ansatz functions, for examplefunctions of higher order or wavelets, as basis functions. Moreover, itis also possible to use both other regularization operators P and othercost functions C.

[0064] The use of the method is described below with reference to anexample of quality assurance in the industrial sector.

[0065] In the course of the production of an industrial mass-produceditem, various data on the product are acquired automatically by sensors.Their aim is to use these data to select effective productsautomatically and appraise them manually. Acquired datalattributes canbe, for example: dimensions of the product, weight, temperature, and/ormaterial composition.

[0066] Each product is characterized by a plurality of attributes andtherefore corresponds to a data record x_(i). The number of attributesforms the dimension d. There now exists a comprehensive historicalproduct database in which all attributes (measured values) of theproducts are stored together with the information on their quality class(“acceptable”, “defective”) (y_(i)). Here, y_(i)=1 is to signify thequality class “Acceptable” and y_(i)=−1 is to signify the quality class“Defective”. The aim now is to use the product database to construct aclassifier ƒ which permits the quality class of each new product to bepredicted in online operation with the aid of the measured values of theproduct. Products classified as “Defective” are automatically selectedfor manual quality control.

[0067] A classification task is involved here. A device 1 for generatinga classifier for the quality of the products is illustratedschematically in FIG. 1. Historical data must be present before aclassifier can be generated. For this purpose, the data occurring in theproduction process 10 are acquired electronically by means ofmeasurement sensors 20. This process can take place independently of theautomatic generation of the classifier at an earlier point in time. Theacquired data can be further preprocessed by means of a signalpreprocessing device 30 by virtue of the fact that the signals are, forexample, normalized or subjected to special transformations, for exampleFourier or wavelet transformations, and possibly smoothed. Thereafter,the measured data are preferably stored in tabular form with the productattributes as columns and the products as rows. The storage of theacquired/processed (historical) data is performed in a database, orsimply in a file 40, such that an electronic training set is present.

[0068] With the aid of an access device 50, the data of the producttable are entered by the processor of an arithmetic unit 60, which isequipped with a memory and with the classification software on the basisof the sparse-grid technique. The classification software calculates afunctional relationship (classifier) between the product attributes andthe quality class(es). The classifier 80 can be visualized graphicallyby means of the output device 70, sent to online classification orstored in a database/file 90, it is possible in the case of a databasefor the database 90 to be identical to the database 40.

[0069] The use of conventional classification methods encounters twodifficulties in the case of automatic generation of the classifier:

[0070] (i) Classical classification methods cannot be applied to theoverall data volume because of the large number of products in thehistorical product database (frequently a few ten thousands to a fewmillions). Consequently, the classifier ƒ_(c) can be designed only onthe basis of a small sample element, which is generated, for example,with the aid of a random number generator, and it is of lesser quality.

[0071] (ii) The classifier ƒ_(c) designed by conventional methods istime-consuming in the online classification, and this leads in onlineuse to output problems, in particular to time delays in the industrialprocess to be optimized.

[0072] The application of the sparse-grid method solves both problems.The cycle of a sparse-grid classification is illustrated schematicallyin FIG. 2. The method is explained below with the aid of an example. Atthe start of classification, the product attributes are present togetherwith the quality class for all products of the historical productdatabase as a training data record 110. In a following step 120, allcategorical product attributes, that is to say all attributes without adefined metric such as, for example, the product colour, are transformedinto numerical attributes, that is to say attributes with a metric. Thiscan be performed, for example, by allocating a number for each attributecharacteristic value or conversion into a block of binary attributes.Thereafter, all attributes are transformed by means of an affine-linearmapping onto the value range [0,1], in order to render them numericallycomparable.

[0073] Applying the combination method of the sparse-grid technique, instep 130 the stiffness matrix and the load vector of the discretizedsystem (13) are assembled for each of the L subgrids of the combinationmethod. In this case, the discretization level n is prescribed by theuser so as to ensure adequate complexity of the classifier function.Since the number L of the systems (13) of equations together with theirdimension is a function only of the discretization level n (and thenumber of the attributes d), and does not depend on the number of datapoints (products), the systems (13) of equations can also be set up (andsolved) for a very large number of products in a short time. Theresulting L systems (13) of equations are solved in step 140 for eachsubgrid of the combination method by means of iteration methods,generally a preconditioned method of the conjugate gradient. Thecoefficients a, define the subclassifier functions ƒ₁ over theindividual grids, the linear combination thereof producing the overallclassifier ƒ_(n)(^(c)). The latter is therefore present in step 150 overthe coefficients α₁. The classifier ƒ_(n)(^(c)) describes therelationship between the measured values and the quality class of theinspected products. The higher the function value of the classifierfunction, the better the quality of the product, and the lower itsvalue, the worse. The classifier therefore permits not only assignmentto one of the two quality classes “Acceptable”, “Defective”, but even agraded sorting with reference to the quality probability.

[0074] In the course of the online classification, the data of theproduction process are acquired by means of measuring sensors andpreprocessed by means of the signal preprocessing device (compare 10-30in FIG. 1). Thereafter, the data are freely directed to an arithmeticunit, which is equipped with a processor and a memory and can beidentical to the arithmetic unit for automatic generation of theclassifier, or be an arithmetic unit different therefrom, and which isequipped with the online classification software based on thesparse-grid technique. In order to simplify the representation, thearithmetic unit in FIG. 1 is used for automatic generation of theclassifier and for online classification. It can, however, also beprovided that the classifier is generated with the aid of a computingdevice, and that the classifier generated is then used on anothercomputing device for the online classification. The arithmetic unit usedfor the online classification must have a suitable interface (notillustrated) for receiving the electronic product attributes dataacquired with the aid of the measuring sensors.

[0075] On the basis of the measured product attributes, the arithmeticunit used within the scope of the online classification uses thesparse-grid classifier in conjunction with analysing means (notillustrated) to make a prediction of the quality class for therespective product, and assigns this electronically to the product, itbeing possible to visualize the quality class by means of an outputdevice and/or to use it directly to initiate actions. Such an action canconsist, for example, in that a product {tilde over(x)}_(i)(ƒ_(n)(^(c))({tilde over (x)}_(i))<0) characterized as“Defective” is selected automatically and sent for manual appraisal.Moreover, depending on the grade of defectiveness (value ofƒ_(n)(^(c))<0), the sorting can be performed into various categorieswhich, in turn, initiate different actions for investigating andremoving the defect.

[0076] The online classification by means of a sparse-grid method isillustrated schematically in FIG. 3. Each product is characterized byits measured and preprocessed attributes, and therefore corresponds to adata record {tilde over (x)}_(i). The number of the attributes forms, inturn, the dimension d. It follows that, at the start of the onlineclassification, the product attributes are present as an evaluation datarecord 160 for all products to be classified. The number of evaluationdata is frequently only {tilde over (M)}=1 in this case, if the productpresent in the production process is to be classified immediately. Atthe same time, the classifier ƒ_(n)(^(c)) (over the coefficients α_(l)of all L subgrids) is entered from the memory or from a database/file bythe online classification program. In step 170, all categoricalattributes are then transformed into numerical ones, and thereafter a(0,1)-transformation of all attributes is undertaken. This step isperformed with the same methods as in step 120. Thereafter, theindividual subclassifiers ƒ_(l) of all L subgrids are applied to theevaluation data in step 180. The calculated function values are finallycollected for all subgrids in step 190. As a result, there is present instep 200 a vector of the predicted quality classes {tilde over (y)}_(i)for all {tilde over (M)} evaluation data, which vector can be used forthe above-described further processing. Since the number of coefficientsα_(l) and of the subgrids L is independent of the number of trainingdata records and is therefore relatively small, the onlineclassification is performed very quickly, and this renders the describedsparse-grid classification particularly suitable for quality monitoringin mass production.

[0077] The sparse-grid classification was described using the example ofclassification of manufactured products. However, for the person skilledin the art, it follows that the electronic data/attributes processed(classified) during the online classification can characterize anydesired objects or events, and so the method and the device used forexecution are not restricted to the application described here. Thus,the sparse-grid classification method may also be used, in particular,for automatically evaluating customer, financial and corporate data.

[0078] On the basis of the classification quality achieved and of thegiven speed, however, the described sparse-grid classification method issuitable for arbitrary applications of the classification. This is shownin the following example of two benchmarks.

[0079] The first example is a spiral data record which has been proposedby A. Wieland of MITRE Corp. (compare E: Fahlmann, C. Lebiere, THECASCADE-CORRELATION LEARNING ARCHITECTURE, Advances in NeuralInformation Processing Systems 2, Touretzky, ed., Morgan-Kaufmann,1990). The data record is illustrated in FIG. 6A. In this case, 194 datapoints describe two interwoven spirals; the number of attributes d is 2.It is known that neural networks frequently experience difficulties withthis data record, and a few neural networks are not capable ofseparating the two spirals.

[0080] The result of the sparse-grid combination method is illustratedin FIGS. 6A and 6B for λ=0.001 and n=6 or n=8. Two spirals can beseparated correctly as early as level 6 (compare FIG. 6A). Only 577sparse-grid points are required in this case. For level 8 (compare FIG.6B) sparse-grid points, the form of the two spirals becomes smoother andclearer.

[0081] A 10-dimensional test data record with 5 million data points astraining data and 50 000 data points as evaluation data was generated asa second example for the purpose of measuring the output of thesparse-grid classification method, this being done with the aid of thedata generator DatGen (compare G. Melli, DATGEN: A PROGRAMME THATCREATES STRUCTURED DATA. Website, http://www.datasetgenerator.com). Thecall wasdatgen-r1X0/200,R,O:0/200,R,O:0/200,R,O:0/200,R,O:0/200,R,O:0/200,R,O:0/200,R,O:0/200,R,O:0/200,R,O:0/200,R,O:0/200,R,O:0-R2-C2/6-D2/7-Ti10/60-O5050000-p-e0.15.

[0082] The results are illustrated in Table 1.

[0083] The measurements were carried out on a Pentium III 700 MHzmachine. The highest storage requirement (for level 2 with 5 milliondata points) was 500 Mbytes. The value of the regularization parameterwas λ=0.01.

[0084] The classification quality on the training and test set (in percent) are shown in the third and fourth columns of Table 1. The lastcolumn contains the number of the iterations in the method of theconjugated gradient for the purpose of solving the systems of equations.The results are to be seen in the table below. The overall computingtime scales in an approximately linear fashion and is moderate even forthese gigantic data records. TABLE 1 Number of Training EvaluationComputing Number of Level data points quality quality time (s)iterations 1  50000 98.8 97.2 19 47 500000 97.6 97.4 104 50 5 million97.4 97.4 811 56 2  50000 99.8 96.3 265 592 500000 98.6 97.8 1126 635 5million 97.9 97.9 7764 688

[0085] The features of the invention disclosed in the above description,the drawing and the claims can be significant both individually and inany desired combination for the implementation of he invention in itsvarious embodiment:

1. Device for generating a classifier for automatically sorting objects,which are respectively characterized by electronic attributes, inparticular a classifier for automatically sorting manufactured productsinto up-to-standard products and defective products, having a storagedevice for storing a set of electronic training data, which comprises arespective electronic attribute set for training objects, and having aprocessor device for processing the electronic training data, adimension (d) being determined by the number of attributes in therespective electronic attribute set, characterized in that the processordevice has discretization means for automatically discretizing afunction space (V), which is defined over the real numbers (h_(l)=2^(l)^(_(t)) ), into subspaces (V_(N), N=2, 3, . . .) by means of a sparsegrid technique and processing the electronic training data with the aidof a processor device.
 2. Device according to claim 1, characterized inthat the processor device has evaluation means for automaticallyevaluating the classifier generated during processing of the electronictraining data, in order to apply the classifier to a set of electronicevaluation data such that quality of the classifier can be evaluated. 3.Device according to claim 1, characterized by interface means forcoupling an input device for user inputs and/or for coupling a graphicsoutput device.
 4. Device for generating a classifier for automaticallysorting objects, which are respectively characterized by electronicattributes, in particular a classifier for automatically sortingmanufactured products into up-to-standard products and defectiveproducts, the method having the following steps: transmitting a set ofelectronic training data, which comprises a respective electronicattribute set for training objects, from a storage device to a processordevice, dimension (d) being determined by the number of attributes inthe respective electronic attribute set; processing the electronictraining data in the processor device, a function space (V) defined overR^(d) being electronically discretized into subspaces (V_(N),N=2, 3, . ..) with the aid of discretization means with the use of a sparse gridtechnique; forming the classifier as a function of the processing of theelectronic training data in the processor device; and electronicallystoring the classifier formed.
 5. Method according to claim 4,characterized in that the classifier formed for evaluating the qualityof the classifier is automatically applied to a set of electronicevaluation data in order to form quality parameters which are indicativeof the quality of the classifier.
 6. Method according to claim 4,characterized in that a combination method of the sparse grid techniqueis applied for the electronic discretization of the function space (V).7. Use of a device according to one of claims 1 to 3 for the purpose ofexecuting a data mining method.
 8. Use of a method according to one ofclaims 4 to 6 for the purpose of executing a data mining method. 9.Device for online sorting of objects which are characterized byrespective electronic attributes, in particular of manufactured productsinto up-to-standard products and defective products with the aid of anelectronic classifier generated using the sparse grid technique, thedevice having: Reception means for receiving characteristic features ofthe objects to be sorted in the form of electronic attributes; and Aprocessor device with: Analysing means for online analysis of theelectronic attributes with the aid of the classifier; and Assignmentmeans for electronically assigning the objects to be sorted to one of aplurality of sorting classes as a function of the automatic onlineanalysis.
 10. Method for online sorting of objects which arecharacterized by respective electronic attributes, in particularmanufactured products into up-to-standard products and defectiveproducts by means of an electronic classifier generated using the sparsegrid technique, the method having the following steps: Online detectionof characteristic features, that are the form of electronic attributes,of the objects to be sorted; Automatic online analysis of the electronicattributes using the classifier with the aid of a processor device; andAssignment of the objects to be sorted to one of a plurality of sortingclasses as a function of the automatic online analysis.